AITopics | intermediate state

Collaborating Authors

intermediate state

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

43da8cca8f14139774bcbd935d51e0f2-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 15:06:21 GMT

deq, gradient, robustness, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > Austria (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Characterizing Pattern Matching and Its Limits on Compositional Task Structures

Chang, Hoyeon, Park, Jinho, Cho, Hanseul, Yang, Sohee, Ko, Miyoung, Hwang, Hyeonbin, Won, Seungpil, Lee, Dohaeng, Ahn, Youbin, Seo, Minjoon

arXiv.org Artificial IntelligenceNov-27-2025

Despite impressive capabilities, LLMs' successes often rely on pattern-matching behaviors, yet these are also linked to OOD generalization failures in compositional tasks. However, behavioral studies commonly employ task setups that allow multiple generalization sources (e.g., algebraic invariances, structural repetition), obscuring a precise and testable account of how well LLMs perform generalization through pattern matching and their limitations. To address this ambiguity, we first formalize pattern matching as functional equivalence, i.e., identifying pairs of subsequences of inputs that consistently lead to identical results when the rest of the input is held constant. Then, we systematically study how decoder-only Transformer and Mamba behave in controlled tasks with compositional structures that isolate this mechanism. Our formalism yields predictive and quantitative insights: (1) Instance-wise success of pattern matching is well predicted by the number of contexts witnessing the relevant functional equivalence. (2) We prove a tight sample complexity bound of learning a two-hop structure by identifying the exponent of the data scaling law for perfect in-domain generalization. Our empirical results align with the theoretical prediction, under 20x parameter scaling and across architectures. (3) Path ambiguity is a structural barrier: when a variable influences the output via multiple paths, models fail to form unified intermediate state representations, impairing accuracy and interpretability. (4) Chain-of-Thought reduces data requirements yet does not resolve path ambiguity. Hence, we provide a predictive, falsifiable boundary for pattern matching and a foundational diagnostic for disentangling mixed generalization mechanisms.

artificial intelligence, machine learning, pattern recognition, (19 more...)

arXiv.org Artificial Intelligence

2505.20278

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.93)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Denoising Refinement Diffusion Models for Simultaneous Generation of Multi-scale Mobile Network Traffic

Qi, Xiaoqian, Chai, Haoye, Liu, Sichang, Yue, Lei, Pan, Raoyuan, Wang, Yue, Li, Yong

arXiv.org Artificial IntelligenceNov-26-2025

The planning, management, and resource scheduling of cellular mobile networks require joint estimation of mobile traffic across different layers and nodes. Mobile traffic generation can proactively anticipate user demands and capture the dynamics of network load. However, existing methods mainly focus on generating traffic at a single spatiotemporal resolution, making it difficult to jointly model multi-scale traffic patterns. In this paper, we propose ZoomDiff, a diffusion-based model for multi-scale mobile traffic generation. ZoomDiff maps urban environmental context into mobile traffic with multiple spatial and temporal resolutions through a set of customized Denoising Refinement Diffusion Models (DRDM). DRDM employs a multi-stage noise-adding and denoising mechanism, enabling different stages to generate traffic at distinct spatiotemporal resolutions. This design aligns the progressive denoising process with hierarchical network layers, including base stations, cells, and grids of varying granularities. Experiments on real-world mobile traffic datasets show that ZoomDiff achieves at least an 18.4% improvement over state-of-the-art baselines in multi-scale traffic generation tasks. Moreover, ZoomDiff demonstrates strong efficiency and cross-city generalization, highlighting its potential as a powerful generative framework for modeling multi-scale mobile network dynamics.

artificial intelligence, machine learning, traffic, (19 more...)

arXiv.org Artificial Intelligence

2511.17532

Country:

Asia > China (0.48)
North America > United States (0.30)

Genre: Research Report > New Finding (0.46)

Industry:

Telecommunications (1.00)
Information Technology > Networks (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Treatment Stitching with Schrödinger Bridge for Enhancing Offline Reinforcement Learning in Adaptive Treatment Strategies

Shin, Dong-Hee, Lee, Deok-Joong, Son, Young-Han, Kam, Tae-Eui

arXiv.org Artificial IntelligenceNov-18-2025

Adaptive treatment strategies (ATS) are sequential decision-making processes that enable personalized care by dynamically adjusting treatment decisions in response to evolving patient symptoms. While reinforcement learning (RL) offers a promising approach for optimizing ATS, its conventional online trial-and-error learning mechanism is not permissible in clinical settings due to risks of harm to patients. Offline RL tackles this limitation by learning policies exclusively from historical treatment data, but its performance is often constrained by data scarcity-a pervasive challenge in clinical domains. To overcome this, we propose Treatment Stitching (TreatStitch), a novel data augmentation framework that generates clinically valid treatment trajectories by intelligently stitching segments from existing treatment data. Specifically, TreatStitch identifies similar intermediate patient states across different trajectories and stitches their respective segments. Even when intermediate states are too dissimilar to stitch directly, TreatStitch leverages the Schrödinger bridge method to generate smooth and energy-efficient bridging trajectories that connect dissimilar states. By augmenting these synthetic trajectories into the original dataset, offline RL can learn from a more diverse dataset, thereby improving its ability to optimize ATS. Extensive experiments across multiple treatment datasets demonstrate the effectiveness of TreatStitch in enhancing offline RL performance. Furthermore, we provide a theoretical justification showing that TreatStitch maintains clinical validity by avoiding out-of-distribution transitions.

machine learning, reinforcement learning, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2511.12075

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Information Technology (0.67)
Health & Medicine > Health Care Technology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis

Yang, Dong, Cai, Yiyi, Saito, Yuki, Wang, Lixu, Saruwatari, Hiroshi

arXiv.org Artificial IntelligenceOct-24-2025

We propose Shallow Flow Matching (SFM), a novel mechanism that enhances flow matching (FM)-based text-to-speech (TTS) models within a coarse-to-fine generation paradigm. Unlike conventional FM modules, which use the coarse representations from the weak generator as conditions, SFM constructs intermediate states along the FM paths from these representations. During training, we introduce an orthogonal projection method to adaptively determine the temporal position of these states, and apply a principled construction strategy based on a single-segment piecewise flow. The SFM inference starts from the intermediate state rather than pure noise, thereby focusing computation on the latter stages of the FM paths. We integrate SFM into multiple TTS models with a lightweight SFM head. Experiments demonstrate that SFM yields consistent gains in speech naturalness across both objective and subjective evaluations, and significantly accelerates inference when using adaptive-step ODE solvers. Demo and codes are available at https://ydqmkkx.github.io/SFMDemo/.

artificial intelligence, machine learning, speech synthesis, (17 more...)

arXiv.org Artificial Intelligence

2505.12226

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)

Add feedback

Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking

Chao, Chen-Hao, Sun, Wei-Fang, Liang, Hanwen, Lee, Chun-Yi, Krishnan, Rahul G.

arXiv.org Artificial IntelligenceOct-23-2025

Masked diffusion models (MDM) are powerful generative models for discrete data that generate samples by progressively unmasking tokens in a sequence. Each token can take one of two states: masked or unmasked. We observe that token sequences often remain unchanged between consecutive sampling steps; consequently, the model repeatedly processes identical inputs, leading to redundant computation. To address this inefficiency, we propose the Partial masking scheme (Prime), which augments MDM by allowing tokens to take intermediate states interpolated between the masked and unmasked states. This design enables the model to make predictions based on partially observed token information, and facilitates a fine-grained denoising process. We derive a variational training objective and introduce a simple architectural design to accommodate intermediate-state inputs. Our method demonstrates superior performance across a diverse set of generative modeling tasks. On text data, it achieves a perplexity of 15.36 on OpenWebText, outperforming previous MDM (21.52), autoregressive models (17.54), and their hybrid variants (17.58), without relying on an autoregressive formulation. On image data, it attains competitive FID scores of 3.26 on CIFAR-10 and 6.98 on ImageNet-32, comparable to leading continuous generative models.

large language model, machine learning, mdlm-prime, (17 more...)

arXiv.org Artificial Intelligence

2505.18495

Country: North America > Canada > Ontario (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Fine-tuning Flow Matching Generative Models with Intermediate Feedback

Fan, Jiajun, Cheng, Chaoran, Shen, Shuaike, Zhou, Xiangxin, Liu, Ge

arXiv.org Artificial IntelligenceOct-22-2025

Flow-based generative models have shown remarkable success in text-to-image generation, yet fine-tuning them with intermediate feedback remains challenging, especially for continuous-time flow matching models. Most existing approaches solely learn from outcome rewards, struggling with the credit assignment problem. Alternative methods that attempt to learn a critic via direct regression on cumulative rewards often face training instabilities and model collapse in online settings. We present AC-Flow, a robust actor-critic framework that addresses these challenges through three key innovations: (1) reward shaping that provides well-normalized learning signals to enable stable intermediate value learning and gradient control, (2) a novel dual-stability mechanism that combines advantage clipping to prevent destructive policy updates with a warm-up phase that allows the critic to mature before influencing the actor, and (3) a scalable generalized critic weighting scheme that extends traditional reward-weighted methods while preserving model diversity through Wasserstein regularization. Through extensive experiments on Stable Diffusion 3, we demonstrate that AC-Flow achieves state-of-the-art performance in text-to-image alignment tasks and generalization to unseen human preference models. Our results demonstrate that even with a computationally efficient critic model, we can robustly finetune flow models without compromising generative quality, diversity, or stability.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.18072

Country:

North America > United States (1.00)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mitigating the Noise Shift for Denoising Generative Models via Noise Awareness Guidance

Zhong, Jincheng, Jiang, Boyuan, Tao, Xin, Wan, Pengfei, Gai, Kun, Long, Mingsheng

arXiv.org Artificial IntelligenceOct-15-2025

Existing denoising generative models rely on solving discretized reverse-time SDEs or ODEs. In this paper, we identify a long-overlooked yet pervasive issue in this family of models: a misalignment between the pre-defined noise level and the actual noise level encoded in intermediate states during sampling. We refer to this misalignment as noise shift. Through empirical analysis, we demonstrate that noise shift is widespread in modern diffusion models and exhibits a systematic bias, leading to sub-optimal generation due to both out-of-distribution generalization and inaccurate denoising updates. To address this problem, we propose Noise Awareness Guidance (NAG), a simple yet effective correction method that explicitly steers sampling trajectories to remain consistent with the pre-defined noise schedule. We further introduce a classifier-free variant of NAG, which jointly trains a noise-conditional and a noise-unconditional model via noise-condition dropout, thereby eliminating the need for external classifiers. Extensive experiments, including ImageNet generation and various supervised fine-tuning tasks, show that NAG consistently mitigates noise shift and substantially improves the generation quality of mainstream diffusion models.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.12497

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Toward Safer Diffusion Language Models: Discovery and Mitigation of Priming Vulnerability

Yamabe, Shojiro, Sakuma, Jun

arXiv.org Artificial IntelligenceOct-2-2025

Diffusion language models (DLMs) generate tokens in parallel through iterative denoising, which can reduce latency and enable bidirectional conditioning. However, the safety risks posed by jailbreak attacks that exploit this inference mechanism are not well understood. In this paper, we reveal that DLMs have a critical vulnerability stemming from their iterative denoising process and propose a countermeasure. Specifically, our investigation shows that if an affirmative token for a harmful query appears at an intermediate step, subsequent denoising can be steered toward a harmful response even in aligned models. As a result, simply injecting such affirmative tokens can readily bypass the safety guardrails. Furthermore, we demonstrate that the vulnerability allows existing optimization-based jailbreak attacks to succeed on DLMs. Building on this analysis, we propose a novel safety alignment method tailored to DLMs that trains models to generate safe responses from contaminated intermediate states that contain affirmative tokens. Our experiments indicate that the proposed method significantly mitigates the vulnerability with minimal impact on task performance. Furthermore, our method improves robustness against conventional jailbreak attacks. Our work underscores the need for DLM-specific safety research.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.00565

Genre: Research Report > New Finding (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

A Weighted Gradient Tracking Privacy-Preserving Method for Distributed Optimization

Xie, Furan, Liu, Bing, Chai, Li

arXiv.org Artificial IntelligenceSep-24-2025

This paper investigates the privacy-preserving distributed optimization problem, aiming to protect agents' private information from potential attackers during the optimization process. Gradient tracking, an advanced technique for improving the convergence rate in distributed optimization, has been applied to most first-order algorithms in recent years. We first reveal the inherent privacy leakage risk associated with gradient tracking. Building upon this insight, we propose a weighted gradient tracking distributed privacy-preserving algorithm, eliminating the privacy leakage risk in gradient tracking using decaying weight factors. Then, we characterize the convergence of the proposed algorithm under time-varying heterogeneous step sizes. We prove the proposed algorithm converges precisely to the optimal solution under mild assumptions. Finally, numerical simulations validate the algorithm's effectiveness through a classical distributed estimation problem and the distributed training of a convolutional neural network.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.18134

Country: Asia > China (0.28)

Genre: Research Report (0.90)

Industry:

Information Technology > Security & Privacy (0.68)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.82)

Add feedback